Overview

Dataset statistics

Number of variables22
Number of observations113066
Missing cells190746
Missing cells (%)7.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory65.5 MiB
Average record size in memory607.7 B

Variable types

Numeric11
Categorical9
Boolean2

Alerts

ListingKey has a high cardinality: 113066 distinct values High cardinality
ListingCreationDate has a high cardinality: 113064 distinct values High cardinality
ClosedDate has a high cardinality: 2802 distinct values High cardinality
LoanOriginalAmount is highly correlated with MonthlyLoanPaymentHigh correlation
MonthlyLoanPayment is highly correlated with LoanOriginalAmountHigh correlation
BorrowerAPR is highly correlated with BorrowerRateHigh correlation
BorrowerRate is highly correlated with BorrowerAPR and 2 other fieldsHigh correlation
CreditScoreRangeLower is highly correlated with BorrowerRate and 1 other fieldsHigh correlation
CreditScoreRangeUpper is highly correlated with BorrowerRate and 1 other fieldsHigh correlation
LoanOriginalAmount is highly correlated with MonthlyLoanPaymentHigh correlation
MonthlyLoanPayment is highly correlated with LoanOriginalAmountHigh correlation
BorrowerAPR is highly correlated with BorrowerRateHigh correlation
BorrowerRate is highly correlated with BorrowerAPRHigh correlation
CreditScoreRangeLower is highly correlated with CreditScoreRangeUpperHigh correlation
CreditScoreRangeUpper is highly correlated with CreditScoreRangeLowerHigh correlation
IncomeVerifiable is highly correlated with DebtToIncomeRatioHigh correlation
DebtToIncomeRatio is highly correlated with IncomeVerifiableHigh correlation
LoanOriginalAmount is highly correlated with MonthlyLoanPaymentHigh correlation
MonthlyLoanPayment is highly correlated with LoanOriginalAmountHigh correlation
BorrowerAPR is highly correlated with BorrowerRateHigh correlation
BorrowerRate is highly correlated with BorrowerAPRHigh correlation
CreditScoreRangeLower is highly correlated with CreditScoreRangeUpperHigh correlation
CreditScoreRangeUpper is highly correlated with CreditScoreRangeLowerHigh correlation
LoanStatus is highly correlated with Term and 1 other fieldsHigh correlation
Term is highly correlated with LoanStatusHigh correlation
LoanOriginalAmount is highly correlated with MonthlyLoanPayment and 1 other fieldsHigh correlation
MonthlyLoanPayment is highly correlated with LoanOriginalAmountHigh correlation
BorrowerAPR is highly correlated with BorrowerRate and 2 other fieldsHigh correlation
BorrowerRate is highly correlated with BorrowerAPR and 2 other fieldsHigh correlation
CreditGrade is highly correlated with LoanOriginalAmount and 4 other fieldsHigh correlation
ProsperRating (Alpha) is highly correlated with BorrowerAPR and 1 other fieldsHigh correlation
CreditScoreRangeLower is highly correlated with CreditGrade and 1 other fieldsHigh correlation
CreditScoreRangeUpper is highly correlated with CreditGrade and 1 other fieldsHigh correlation
IncomeRange is highly correlated with EmploymentStatusHigh correlation
IncomeVerifiable is highly correlated with DebtToIncomeRatio and 1 other fieldsHigh correlation
DebtToIncomeRatio is highly correlated with IncomeVerifiableHigh correlation
EmploymentStatus is highly correlated with LoanStatus and 2 other fieldsHigh correlation
ClosedDate has 57990 (51.3%) missing values Missing
CreditGrade has 84113 (74.4%) missing values Missing
ProsperRating (Alpha) has 29084 (25.7%) missing values Missing
DebtToIncomeRatio has 8472 (7.5%) missing values Missing
EmploymentStatus has 2255 (2.0%) missing values Missing
EmploymentStatusDuration has 7625 (6.7%) missing values Missing
StatedMonthlyIncome is highly skewed (γ1 = 125.0987676) Skewed
df_index is uniformly distributed Uniform
ListingKey is uniformly distributed Uniform
ListingCreationDate is uniformly distributed Uniform
df_index has unique values Unique
ListingKey has unique values Unique
ListingCategory (numeric) has 16965 (15.0%) zeros Zeros
StatedMonthlyIncome has 1394 (1.2%) zeros Zeros
EmploymentStatusDuration has 1503 (1.3%) zeros Zeros

Reproduction

Analysis started2022-08-25 16:15:53.172328
Analysis finished2022-08-25 16:16:41.483769
Duration48.31 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct113066
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean56854.43196
Minimum0
Maximum113936
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:41.648335image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5657.25
Q128353.25
median56781.5
Q385339.75
95-th percentile108209.75
Maximum113936
Range113936
Interquartile range (IQR)56986.5

Descriptive statistics

Standard deviation32897.80264
Coefficient of variation (CV)0.5786321576
Kurtosis-1.200343729
Mean56854.43196
Median Absolute Deviation (MAD)28493
Skewness0.004341561274
Sum6428303204
Variance1082265418
MonotonicityStrictly increasing
2022-08-25T19:16:42.029325image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
758111
 
< 0.1%
758221
 
< 0.1%
758211
 
< 0.1%
758201
 
< 0.1%
758191
 
< 0.1%
758181
 
< 0.1%
758171
 
< 0.1%
758161
 
< 0.1%
758151
 
< 0.1%
Other values (113056)113056
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
101
< 0.1%
ValueCountFrequency (%)
1139361
< 0.1%
1139351
< 0.1%
1139341
< 0.1%
1139331
< 0.1%
1139321
< 0.1%
1139311
< 0.1%
1139301
< 0.1%
1139291
< 0.1%
1139281
< 0.1%
1139271
< 0.1%

ListingKey
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct113066
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size8.6 MiB
1021339766868145413AB3B
 
1
F0663582993853438C8A8E0
 
1
66983585151599608A5ABC6
 
1
66953459735530674736867
 
1
6693339801389188068FB4E
 
1
Other values (113061)
113061 

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters2600518
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113066 ?
Unique (%)100.0%

Sample

1st row1021339766868145413AB3B
2nd row10273602499503308B223C1
3rd row0EE9337825851032864889A
4th row0EF5356002482715299901A
5th row0F023589499656230C5E3E2

Common Values

ValueCountFrequency (%)
1021339766868145413AB3B1
 
< 0.1%
F0663582993853438C8A8E01
 
< 0.1%
66983585151599608A5ABC61
 
< 0.1%
669534597355306747368671
 
< 0.1%
6693339801389188068FB4E1
 
< 0.1%
6AB735643902836208D76F31
 
< 0.1%
6AAB34147517188050BD9611
 
< 0.1%
6AA0359887299121671E4241
 
< 0.1%
F7E3359743115455359D06A1
 
< 0.1%
F7E13468604552859BD924B1
 
< 0.1%
Other values (113056)113056
> 99.9%

Length

2022-08-25T19:16:42.222805image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1021339766868145413ab3b1
 
< 0.1%
0ffc35866018516621b0d3f1
 
< 0.1%
0f1035772717087366f9ea71
 
< 0.1%
0f043596202561788ea13d51
 
< 0.1%
0f123545674891886d9f1061
 
< 0.1%
0f1734025150298088a5f2b1
 
< 0.1%
0f1a3597143888805163ef71
 
< 0.1%
0f1c3583260311305d68f871
 
< 0.1%
0fbc3556025226720be6dd41
 
< 0.1%
0f353575943675863d1afc01
 
< 0.1%
Other values (113056)113056
> 99.9%

Most occurring characters

ValueCountFrequency (%)
3318863
12.3%
5259829
10.0%
4212165
8.2%
9206809
8.0%
6201381
7.7%
8200419
7.7%
0198411
7.6%
7195121
7.5%
2192853
7.4%
1191444
7.4%
Other values (6)423223
16.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2177295
83.7%
Uppercase Letter423223
 
16.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3318863
14.6%
5259829
11.9%
4212165
9.7%
9206809
9.5%
6201381
9.2%
8200419
9.2%
0198411
9.1%
7195121
9.0%
2192853
8.9%
1191444
8.8%
Uppercase Letter
ValueCountFrequency (%)
D70894
16.8%
C70794
16.7%
A70479
16.7%
F70423
16.6%
E70415
16.6%
B70218
16.6%

Most occurring scripts

ValueCountFrequency (%)
Common2177295
83.7%
Latin423223
 
16.3%

Most frequent character per script

Common
ValueCountFrequency (%)
3318863
14.6%
5259829
11.9%
4212165
9.7%
9206809
9.5%
6201381
9.2%
8200419
9.2%
0198411
9.1%
7195121
9.0%
2192853
8.9%
1191444
8.8%
Latin
ValueCountFrequency (%)
D70894
16.8%
C70794
16.7%
A70479
16.7%
F70423
16.6%
E70415
16.6%
B70218
16.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII2600518
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3318863
12.3%
5259829
10.0%
4212165
8.2%
9206809
8.0%
6201381
7.7%
8200419
7.7%
0198411
7.6%
7195121
7.5%
2192853
7.4%
1191444
7.4%
Other values (6)423223
16.3%

ListingCreationDate
Categorical

HIGH CARDINALITY
UNIFORM

Distinct113064
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Memory size9.3 MiB
2013-06-03 17:27:50.540000000
 
2
2012-10-20 12:21:46.333000000
 
2
2007-08-26 19:09:29.263000000
 
1
2012-01-08 05:15:06.027000000
 
1
2014-01-14 08:01:37.673000000
 
1
Other values (113059)
113059 

Length

Max length29
Median length29
Mean length28.96621442
Min length19

Characters and Unicode

Total characters3275094
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique113062 ?
Unique (%)> 99.9%

Sample

1st row2007-08-26 19:09:29.263000000
2nd row2014-02-27 08:28:07.900000000
3rd row2007-01-05 15:00:47.090000000
4th row2012-10-22 11:02:35.010000000
5th row2013-09-14 18:38:39.097000000

Common Values

ValueCountFrequency (%)
2013-06-03 17:27:50.5400000002
 
< 0.1%
2012-10-20 12:21:46.3330000002
 
< 0.1%
2007-08-26 19:09:29.2630000001
 
< 0.1%
2012-01-08 05:15:06.0270000001
 
< 0.1%
2014-01-14 08:01:37.6730000001
 
< 0.1%
2011-11-23 11:47:38.7900000001
 
< 0.1%
2013-07-26 15:38:41.6370000001
 
< 0.1%
2009-08-04 11:51:49.6300000001
 
< 0.1%
2007-08-22 15:47:58.4170000001
 
< 0.1%
2012-11-19 06:45:40.8670000001
 
< 0.1%
Other values (113054)113054
> 99.9%

Length

2022-08-25T19:16:42.414296image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2013-11-04295
 
0.1%
2013-12-03271
 
0.1%
2014-01-08268
 
0.1%
2013-12-02265
 
0.1%
2013-12-05249
 
0.1%
2013-09-16244
 
0.1%
2013-09-17242
 
0.1%
2013-12-04242
 
0.1%
2014-01-13240
 
0.1%
2014-01-15239
 
0.1%
Other values (115411)223577
98.9%

Most occurring characters

ValueCountFrequency (%)
01132112
34.6%
1357676
 
10.9%
2303882
 
9.3%
-226132
 
6.9%
:226132
 
6.9%
3189101
 
5.8%
7126992
 
3.9%
4121271
 
3.7%
113066
 
3.5%
.112684
 
3.4%
Other values (4)366046
 
11.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2597080
79.3%
Other Punctuation338816
 
10.3%
Dash Punctuation226132
 
6.9%
Space Separator113066
 
3.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01132112
43.6%
1357676
 
13.8%
2303882
 
11.7%
3189101
 
7.3%
7126992
 
4.9%
4121271
 
4.7%
5112364
 
4.3%
889284
 
3.4%
683560
 
3.2%
980838
 
3.1%
Other Punctuation
ValueCountFrequency (%)
:226132
66.7%
.112684
33.3%
Dash Punctuation
ValueCountFrequency (%)
-226132
100.0%
Space Separator
ValueCountFrequency (%)
113066
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3275094
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01132112
34.6%
1357676
 
10.9%
2303882
 
9.3%
-226132
 
6.9%
:226132
 
6.9%
3189101
 
5.8%
7126992
 
3.9%
4121271
 
3.7%
113066
 
3.5%
.112684
 
3.4%
Other values (4)366046
 
11.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3275094
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01132112
34.6%
1357676
 
10.9%
2303882
 
9.3%
-226132
 
6.9%
:226132
 
6.9%
3189101
 
5.8%
7126992
 
3.9%
4121271
 
3.7%
113066
 
3.5%
.112684
 
3.4%
Other values (4)366046
 
11.2%

ClosedDate
Categorical

HIGH CARDINALITY
MISSING

Distinct2802
Distinct (%)5.1%
Missing57990
Missing (%)51.3%
Memory size5.8 MiB
2014-03-04 00:00:00
 
105
2014-02-19 00:00:00
 
100
2014-02-11 00:00:00
 
92
2012-10-30 00:00:00
 
81
2013-02-26 00:00:00
 
78
Other values (2797)
54620 

Length

Max length29
Median length19
Mean length19.00363135
Min length19

Characters and Unicode

Total characters1046644
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique110 ?
Unique (%)0.2%

Sample

1st row2009-08-14 00:00:00
2nd row2009-12-17 00:00:00
3rd row2008-01-07 00:00:00
4th row2012-12-19 00:00:00
5th row2008-05-22 00:00:00

Common Values

ValueCountFrequency (%)
2014-03-04 00:00:00105
 
0.1%
2014-02-19 00:00:00100
 
0.1%
2014-02-11 00:00:0092
 
0.1%
2012-10-30 00:00:0081
 
0.1%
2013-02-26 00:00:0078
 
0.1%
2014-01-30 00:00:0076
 
0.1%
2014-01-14 00:00:0075
 
0.1%
2014-02-18 00:00:0072
 
0.1%
2014-02-24 00:00:0072
 
0.1%
2014-02-04 00:00:0071
 
0.1%
Other values (2792)54254
48.0%
(Missing)57990
51.3%

Length

2022-08-25T19:16:42.571878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00:00:0055056
50.0%
2014-03-04105
 
0.1%
2014-02-19100
 
0.1%
2014-02-1192
 
0.1%
2012-10-3081
 
0.1%
2013-02-2678
 
0.1%
2014-01-3076
 
0.1%
2014-01-1475
 
0.1%
2014-02-1872
 
0.1%
2014-02-2472
 
0.1%
Other values (2793)54345
49.3%

Most occurring characters

ValueCountFrequency (%)
0478194
45.7%
-110152
 
10.5%
:110152
 
10.5%
296515
 
9.2%
192206
 
8.8%
55076
 
5.3%
325479
 
2.4%
918106
 
1.7%
815827
 
1.5%
713251
 
1.3%
Other values (4)31686
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number771244
73.7%
Other Punctuation110172
 
10.5%
Dash Punctuation110152
 
10.5%
Space Separator55076
 
5.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0478194
62.0%
296515
 
12.5%
192206
 
12.0%
325479
 
3.3%
918106
 
2.3%
815827
 
2.1%
713251
 
1.7%
412329
 
1.6%
69933
 
1.3%
59404
 
1.2%
Other Punctuation
ValueCountFrequency (%)
:110152
> 99.9%
.20
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-110152
100.0%
Space Separator
ValueCountFrequency (%)
55076
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1046644
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0478194
45.7%
-110152
 
10.5%
:110152
 
10.5%
296515
 
9.2%
192206
 
8.8%
55076
 
5.3%
325479
 
2.4%
918106
 
1.7%
815827
 
1.5%
713251
 
1.3%
Other values (4)31686
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1046644
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0478194
45.7%
-110152
 
10.5%
:110152
 
10.5%
296515
 
9.2%
192206
 
8.8%
55076
 
5.3%
325479
 
2.4%
918106
 
1.7%
815827
 
1.5%
713251
 
1.3%
Other values (4)31686
 
3.0%

LoanStatus
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.0 MiB
Current
55730 
Completed
38061 
Chargedoff
11992 
Defaulted
 
5018
Past Due (1-15 days)
 
800
Other values (7)
 
1465

Length

Max length22
Median length21
Mean length8.357393027
Min length7

Characters and Unicode

Total characters944937
Distinct characters35
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCompleted
2nd rowCurrent
3rd rowCompleted
4th rowCurrent
5th rowCurrent

Common Values

ValueCountFrequency (%)
Current55730
49.3%
Completed38061
33.7%
Chargedoff11992
 
10.6%
Defaulted5018
 
4.4%
Past Due (1-15 days)800
 
0.7%
Past Due (31-60 days)361
 
0.3%
Past Due (61-90 days)311
 
0.3%
Past Due (91-120 days)304
 
0.3%
Past Due (16-30 days)265
 
0.2%
FinalPaymentInProgress203
 
0.2%
Other values (2)21
 
< 0.1%

Length

2022-08-25T19:16:42.719484image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
current55730
46.7%
completed38061
31.9%
chargedoff11992
 
10.1%
defaulted5018
 
4.2%
past2057
 
1.7%
due2057
 
1.7%
days2057
 
1.7%
1-15800
 
0.7%
31-60361
 
0.3%
61-90311
 
0.3%
Other values (5)793
 
0.7%

Most occurring characters

ValueCountFrequency (%)
e156353
16.5%
r123858
13.1%
C105788
11.2%
t101069
10.7%
u62805
6.6%
d57133
 
6.0%
n56344
 
6.0%
o50256
 
5.3%
l43292
 
4.6%
m38264
 
4.0%
Other values (25)149775
15.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter809147
85.6%
Uppercase Letter115732
 
12.2%
Decimal Number7716
 
0.8%
Space Separator6171
 
0.7%
Open Punctuation2057
 
0.2%
Close Punctuation2057
 
0.2%
Dash Punctuation2041
 
0.2%
Math Symbol16
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e156353
19.3%
r123858
15.3%
t101069
12.5%
u62805
7.8%
d57133
 
7.1%
n56344
 
7.0%
o50256
 
6.2%
l43292
 
5.4%
m38264
 
4.7%
p38061
 
4.7%
Other values (8)81712
10.1%
Decimal Number
ValueCountFrequency (%)
13161
41.0%
01257
 
16.3%
6937
 
12.1%
5800
 
10.4%
3626
 
8.1%
9615
 
8.0%
2320
 
4.1%
Uppercase Letter
ValueCountFrequency (%)
C105788
91.4%
D7075
 
6.1%
P2463
 
2.1%
F203
 
0.2%
I203
 
0.2%
Space Separator
ValueCountFrequency (%)
6171
100.0%
Open Punctuation
ValueCountFrequency (%)
(2057
100.0%
Close Punctuation
ValueCountFrequency (%)
)2057
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2041
100.0%
Math Symbol
ValueCountFrequency (%)
>16
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin924879
97.9%
Common20058
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e156353
16.9%
r123858
13.4%
C105788
11.4%
t101069
10.9%
u62805
6.8%
d57133
 
6.2%
n56344
 
6.1%
o50256
 
5.4%
l43292
 
4.7%
m38264
 
4.1%
Other values (13)129717
14.0%
Common
ValueCountFrequency (%)
6171
30.8%
13161
15.8%
(2057
 
10.3%
)2057
 
10.3%
-2041
 
10.2%
01257
 
6.3%
6937
 
4.7%
5800
 
4.0%
3626
 
3.1%
9615
 
3.1%
Other values (2)336
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII944937
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e156353
16.5%
r123858
13.1%
C105788
11.2%
t101069
10.7%
u62805
6.6%
d57133
 
6.0%
n56344
 
6.0%
o50256
 
5.3%
l43292
 
4.6%
m38264
 
4.0%
Other values (25)149775
15.9%

Term
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
36
87224 
60
24228 
12
 
1614

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters226132
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row36
2nd row36
3rd row36
4th row36
5th row36

Common Values

ValueCountFrequency (%)
3687224
77.1%
6024228
 
21.4%
121614
 
1.4%

Length

2022-08-25T19:16:42.876068image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-25T19:16:43.101470image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
3687224
77.1%
6024228
 
21.4%
121614
 
1.4%

Most occurring characters

ValueCountFrequency (%)
6111452
49.3%
387224
38.6%
024228
 
10.7%
11614
 
0.7%
21614
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number226132
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6111452
49.3%
387224
38.6%
024228
 
10.7%
11614
 
0.7%
21614
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common226132
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6111452
49.3%
387224
38.6%
024228
 
10.7%
11614
 
0.7%
21614
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII226132
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6111452
49.3%
387224
38.6%
024228
 
10.7%
11614
 
0.7%
21614
 
0.7%

LoanOriginalAmount
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2468
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8314.762307
Minimum1000
Maximum35000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:43.326870image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum1000
5-th percentile1500
Q14000
median6300
Q312000
95-th percentile20000
Maximum35000
Range34000
Interquartile range (IQR)8000

Descriptive statistics

Standard deviation6237.007841
Coefficient of variation (CV)0.7501125842
Kurtosis1.331303374
Mean8314.762307
Median Absolute Deviation (MAD)3700
Skewness1.224284612
Sum940116915
Variance38900266.81
MonotonicityNot monotonic
2022-08-25T19:16:43.564239image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
400014207
 
12.6%
1500012232
 
10.8%
1000010956
 
9.7%
50006953
 
6.1%
20006042
 
5.3%
30005728
 
5.1%
250003588
 
3.2%
200003234
 
2.9%
10003206
 
2.8%
25002990
 
2.6%
Other values (2458)43930
38.9%
ValueCountFrequency (%)
10003206
2.8%
10018
 
< 0.1%
10052
 
< 0.1%
10101
 
< 0.1%
102533
 
< 0.1%
10306
 
< 0.1%
10312
 
< 0.1%
10321
 
< 0.1%
10351
 
< 0.1%
10361
 
< 0.1%
ValueCountFrequency (%)
35000418
0.4%
349993
 
< 0.1%
347001
 
< 0.1%
346791
 
< 0.1%
340005
 
< 0.1%
337502
 
< 0.1%
337101
 
< 0.1%
335002
 
< 0.1%
334111
 
< 0.1%
330005
 
< 0.1%

MonthlyLoanPayment
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct23567
Distinct (%)20.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean271.9327422
Minimum0
Maximum2251.51
Zeros935
Zeros (%)0.8%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:43.908323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile52.55
Q1130.95
median217.37
Q3370.57
95-th percentile633.655
Maximum2251.51
Range2251.51
Interquartile range (IQR)239.62

Descriptive statistics

Standard deviation192.5499791
Coefficient of variation (CV)0.7080794225
Kurtosis3.154029249
Mean271.9327422
Median Absolute Deviation (MAD)109.29
Skewness1.41627881
Sum30746347.43
Variance37075.49443
MonotonicityNot monotonic
2022-08-25T19:16:44.127739image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
173.712423
 
2.1%
0935
 
0.8%
172.76530
 
0.5%
86.85472
 
0.4%
174.2460
 
0.4%
130.28370
 
0.3%
163.28285
 
0.3%
326.62280
 
0.2%
136.98277
 
0.2%
165.15271
 
0.2%
Other values (23557)106763
94.4%
ValueCountFrequency (%)
0935
0.8%
0.151
 
< 0.1%
0.161
 
< 0.1%
0.231
 
< 0.1%
0.241
 
< 0.1%
0.291
 
< 0.1%
0.441
 
< 0.1%
0.531
 
< 0.1%
0.581
 
< 0.1%
0.921
 
< 0.1%
ValueCountFrequency (%)
2251.511
< 0.1%
2218.531
< 0.1%
2179.221
< 0.1%
2163.631
< 0.1%
2153.381
< 0.1%
2147.641
< 0.1%
2134.061
< 0.1%
2111.781
< 0.1%
1808.841
< 0.1%
1781.281
< 0.1%

ListingCategory (numeric)
Real number (ℝ≥0)

ZEROS

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.776838307
Minimum0
Maximum20
Zeros16965
Zeros (%)15.0%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:44.326212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q33
95-th percentile13
Maximum20
Range20
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.998187782
Coefficient of variation (CV)1.439834567
Kurtosis5.823824174
Mean2.776838307
Median Absolute Deviation (MAD)0
Skewness2.443517343
Sum313966
Variance15.98550554
MonotonicityNot monotonic
2022-08-25T19:16:44.522690image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%)
157624
51.0%
016965
 
15.0%
710448
 
9.2%
27388
 
6.5%
37157
 
6.3%
62568
 
2.3%
42395
 
2.1%
131987
 
1.8%
151507
 
1.3%
18882
 
0.8%
Other values (11)4145
 
3.7%
ValueCountFrequency (%)
016965
 
15.0%
157624
51.0%
27388
 
6.5%
37157
 
6.3%
42395
 
2.1%
5756
 
0.7%
62568
 
2.3%
710448
 
9.2%
8196
 
0.2%
985
 
0.1%
ValueCountFrequency (%)
20762
 
0.7%
19764
 
0.7%
18882
0.8%
1752
 
< 0.1%
16304
 
0.3%
151507
1.3%
14863
0.8%
131987
1.8%
1258
 
0.1%
11214
 
0.2%

BorrowerAPR
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct6677
Distinct (%)5.9%
Missing25
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean0.2189803536
Minimum0.00653
Maximum0.51229
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:44.735127image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.00653
5-th percentile0.09434
Q10.15629
median0.20984
Q30.28386
95-th percentile0.35797
Maximum0.51229
Range0.50576
Interquartile range (IQR)0.12757

Descriptive statistics

Standard deviation0.08048277631
Coefficient of variation (CV)0.3675342331
Kurtosis-0.8839274884
Mean0.2189803536
Median Absolute Deviation (MAD)0.06233
Skewness0.2209129987
Sum24753.75815
Variance0.006477477283
MonotonicityNot monotonic
2022-08-25T19:16:44.941577image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.357973672
 
3.2%
0.356431644
 
1.5%
0.374531260
 
1.1%
0.30532902
 
0.8%
0.2951747
 
0.7%
0.35356715
 
0.6%
0.29776707
 
0.6%
0.15833642
 
0.6%
0.24246605
 
0.5%
0.24758601
 
0.5%
Other values (6667)101546
89.8%
ValueCountFrequency (%)
0.006532
< 0.1%
0.008641
 
< 0.1%
0.013152
< 0.1%
0.013251
 
< 0.1%
0.015481
 
< 0.1%
0.016471
 
< 0.1%
0.01652
< 0.1%
0.016573
< 0.1%
0.018231
 
< 0.1%
0.018751
 
< 0.1%
ValueCountFrequency (%)
0.512291
 
< 0.1%
0.506331
 
< 0.1%
0.488731
 
< 0.1%
0.462011
 
< 0.1%
0.458572
 
< 0.1%
0.423951
 
< 0.1%
0.4135555
< 0.1%
0.408312
 
< 0.1%
0.407454
 
< 0.1%
0.4067911
 
< 0.1%

BorrowerRate
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2294
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1929457428
Minimum0
Maximum0.4975
Zeros8
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:45.169969image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.082
Q10.134
median0.184
Q30.2506
95-th percentile0.3177
Maximum0.4975
Range0.4975
Interquartile range (IQR)0.1166

Descriptive statistics

Standard deviation0.07491660314
Coefficient of variation (CV)0.3882780831
Kurtosis-0.9115075572
Mean0.1929457428
Median Absolute Deviation (MAD)0.0579
Skewness0.2723358036
Sum21815.60335
Variance0.005612497425
MonotonicityNot monotonic
2022-08-25T19:16:45.409333image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.31773672
 
3.2%
0.351905
 
1.7%
0.31991651
 
1.5%
0.291508
 
1.3%
0.26991314
 
1.2%
0.151174
 
1.0%
0.141022
 
0.9%
0.1099928
 
0.8%
0.2907
 
0.8%
0.18791
 
0.7%
Other values (2284)98194
86.8%
ValueCountFrequency (%)
08
< 0.1%
0.00011
 
< 0.1%
0.00051
 
< 0.1%
0.00211
 
< 0.1%
0.0051
 
< 0.1%
0.00991
 
< 0.1%
0.0111
< 0.1%
0.01151
 
< 0.1%
0.0151
 
< 0.1%
0.02951
 
< 0.1%
ValueCountFrequency (%)
0.49752
 
< 0.1%
0.481
 
< 0.1%
0.453
 
< 0.1%
0.42
 
< 0.1%
0.3751
 
< 0.1%
0.3617
 
< 0.1%
0.357521
 
< 0.1%
0.3572
 
< 0.1%
0.3531
 
< 0.1%
0.351905
1.7%

CreditGrade
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing84113
Missing (%)74.4%
Memory size4.2 MiB
C
5649 
D
5153 
B
4389 
AA
3509 
HR
3508 
Other values (3)
6745 

Length

Max length2
Median length1
Mean length1.247228267
Min length1

Characters and Unicode

Total characters36111
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC
2nd rowHR
3rd rowC
4th rowAA
5th rowD

Common Values

ValueCountFrequency (%)
C5649
 
5.0%
D5153
 
4.6%
B4389
 
3.9%
AA3509
 
3.1%
HR3508
 
3.1%
A3315
 
2.9%
E3289
 
2.9%
NC141
 
0.1%
(Missing)84113
74.4%

Length

2022-08-25T19:16:45.601821image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-25T19:16:45.791316image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
c5649
19.5%
d5153
17.8%
b4389
15.2%
aa3509
12.1%
hr3508
12.1%
a3315
11.4%
e3289
11.4%
nc141
 
0.5%

Most occurring characters

ValueCountFrequency (%)
A10333
28.6%
C5790
16.0%
D5153
14.3%
B4389
12.2%
H3508
 
9.7%
R3508
 
9.7%
E3289
 
9.1%
N141
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter36111
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A10333
28.6%
C5790
16.0%
D5153
14.3%
B4389
12.2%
H3508
 
9.7%
R3508
 
9.7%
E3289
 
9.1%
N141
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Latin36111
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A10333
28.6%
C5790
16.0%
D5153
14.3%
B4389
12.2%
H3508
 
9.7%
R3508
 
9.7%
E3289
 
9.1%
N141
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII36111
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A10333
28.6%
C5790
16.0%
D5153
14.3%
B4389
12.2%
H3508
 
9.7%
R3508
 
9.7%
E3289
 
9.1%
N141
 
0.4%

ProsperRating (Alpha)
Categorical

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing29084
Missing (%)25.7%
Memory size5.5 MiB
C
18096 
B
15368 
A
14390 
D
14170 
E
9716 
Other values (2)
12242 

Length

Max length2
Median length1
Mean length1.145769332
Min length1

Characters and Unicode

Total characters96224
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowD
4th rowB
5th rowE

Common Values

ValueCountFrequency (%)
C18096
16.0%
B15368
13.6%
A14390
12.7%
D14170
12.5%
E9716
 
8.6%
HR6917
 
6.1%
AA5325
 
4.7%
(Missing)29084
25.7%

Length

2022-08-25T19:16:45.996770image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-25T19:16:46.221175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
c18096
21.5%
b15368
18.3%
a14390
17.1%
d14170
16.9%
e9716
11.6%
hr6917
 
8.2%
aa5325
 
6.3%

Most occurring characters

ValueCountFrequency (%)
A25040
26.0%
C18096
18.8%
B15368
16.0%
D14170
14.7%
E9716
 
10.1%
H6917
 
7.2%
R6917
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter96224
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A25040
26.0%
C18096
18.8%
B15368
16.0%
D14170
14.7%
E9716
 
10.1%
H6917
 
7.2%
R6917
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Latin96224
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A25040
26.0%
C18096
18.8%
B15368
16.0%
D14170
14.7%
E9716
 
10.1%
H6917
 
7.2%
R6917
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII96224
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A25040
26.0%
C18096
18.8%
B15368
16.0%
D14170
14.7%
E9716
 
10.1%
H6917
 
7.2%
R6917
 
7.2%

CreditScoreRangeLower
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct26
Distinct (%)< 0.1%
Missing591
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean685.5249611
Minimum0
Maximum880
Zeros133
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:46.581216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile560
Q1660
median680
Q3720
95-th percentile780
Maximum880
Range880
Interquartile range (IQR)60

Descriptive statistics

Standard deviation66.63589474
Coefficient of variation (CV)0.09720418441
Kurtosis13.22534937
Mean685.5249611
Median Absolute Deviation (MAD)40
Skewness-1.587483879
Sum77104420
Variance4440.342467
MonotonicityNot monotonic
2022-08-25T19:16:46.750766image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
68016315
14.4%
66016177
14.3%
70015315
13.5%
72012797
11.3%
64012099
10.7%
7409211
8.1%
7606566
5.8%
7804607
 
4.1%
6204172
 
3.7%
6003601
 
3.2%
Other values (16)11615
10.3%
ValueCountFrequency (%)
0133
 
0.1%
3601
 
< 0.1%
4205
 
< 0.1%
44036
 
< 0.1%
460141
 
0.1%
480346
 
0.3%
500554
 
0.5%
5201593
1.4%
5401474
1.3%
5601357
1.2%
ValueCountFrequency (%)
88027
 
< 0.1%
860212
 
0.2%
840567
 
0.5%
8201408
 
1.2%
8002636
 
2.3%
7804607
 
4.1%
7606566
5.8%
7409211
8.1%
72012797
11.3%
70015315
13.5%

CreditScoreRangeUpper
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct26
Distinct (%)< 0.1%
Missing591
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean704.5249611
Minimum19
Maximum899
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:46.941259image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum19
5-th percentile579
Q1679
median699
Q3739
95-th percentile799
Maximum899
Range880
Interquartile range (IQR)60

Descriptive statistics

Standard deviation66.63589474
Coefficient of variation (CV)0.0945827308
Kurtosis13.22534937
Mean704.5249611
Median Absolute Deviation (MAD)40
Skewness-1.587483879
Sum79241445
Variance4440.342467
MonotonicityNot monotonic
2022-08-25T19:16:47.183618image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
69916315
14.4%
67916177
14.3%
71915315
13.5%
73912797
11.3%
65912099
10.7%
7599211
8.1%
7796566
5.8%
7994607
 
4.1%
6394172
 
3.7%
6193601
 
3.2%
Other values (16)11615
10.3%
ValueCountFrequency (%)
19133
 
0.1%
3791
 
< 0.1%
4395
 
< 0.1%
45936
 
< 0.1%
479141
 
0.1%
499346
 
0.3%
519554
 
0.5%
5391593
1.4%
5591474
1.3%
5791357
1.2%
ValueCountFrequency (%)
89927
 
< 0.1%
879212
 
0.2%
859567
 
0.5%
8391408
 
1.2%
8192636
 
2.3%
7994607
 
4.1%
7796566
5.8%
7599211
8.1%
73912797
11.3%
71915315
13.5%

IncomeRange
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.5 MiB
$25,000-49,999
31940 
$50,000-74,999
30749 
$100,000+
17188 
$75,000-99,999
16780 
Not displayed
7741 
Other values (3)
8668 

Length

Max length14
Median length14
Mean length12.77107176
Min length2

Characters and Unicode

Total characters1443974
Distinct characters24
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row$25,000-49,999
2nd row$50,000-74,999
3rd rowNot displayed
4th row$25,000-49,999
5th row$100,000+

Common Values

ValueCountFrequency (%)
$25,000-49,99931940
28.2%
$50,000-74,99930749
27.2%
$100,000+17188
15.2%
$75,000-99,99916780
14.8%
Not displayed7741
 
6.8%
$1-24,9997241
 
6.4%
Not employed806
 
0.7%
$0621
 
0.5%

Length

2022-08-25T19:16:47.470851image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-25T19:16:47.769059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
25,000-49,99931940
26.3%
50,000-74,99930749
25.3%
100,00017188
14.1%
75,000-99,99916780
13.8%
not8547
 
7.0%
displayed7741
 
6.4%
1-24,9997241
 
6.0%
employed806
 
0.7%
0621
 
0.5%

Most occurring characters

ValueCountFrequency (%)
0355717
24.6%
9325630
22.6%
,183367
12.7%
$104519
 
7.2%
-86710
 
6.0%
579469
 
5.5%
469930
 
4.8%
747529
 
3.3%
239181
 
2.7%
124429
 
1.7%
Other values (14)127493
 
8.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number941885
65.2%
Other Punctuation183367
 
12.7%
Currency Symbol104519
 
7.2%
Lowercase Letter93211
 
6.5%
Dash Punctuation86710
 
6.0%
Math Symbol17188
 
1.2%
Space Separator8547
 
0.6%
Uppercase Letter8547
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
d16288
17.5%
e9353
10.0%
o9353
10.0%
t8547
9.2%
p8547
9.2%
l8547
9.2%
y8547
9.2%
i7741
8.3%
s7741
8.3%
a7741
8.3%
Decimal Number
ValueCountFrequency (%)
0355717
37.8%
9325630
34.6%
579469
 
8.4%
469930
 
7.4%
747529
 
5.0%
239181
 
4.2%
124429
 
2.6%
Other Punctuation
ValueCountFrequency (%)
,183367
100.0%
Currency Symbol
ValueCountFrequency (%)
$104519
100.0%
Dash Punctuation
ValueCountFrequency (%)
-86710
100.0%
Math Symbol
ValueCountFrequency (%)
+17188
100.0%
Space Separator
ValueCountFrequency (%)
8547
100.0%
Uppercase Letter
ValueCountFrequency (%)
N8547
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1342216
93.0%
Latin101758
 
7.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0355717
26.5%
9325630
24.3%
,183367
13.7%
$104519
 
7.8%
-86710
 
6.5%
579469
 
5.9%
469930
 
5.2%
747529
 
3.5%
239181
 
2.9%
124429
 
1.8%
Other values (2)25735
 
1.9%
Latin
ValueCountFrequency (%)
d16288
16.0%
e9353
9.2%
o9353
9.2%
t8547
8.4%
p8547
8.4%
l8547
8.4%
y8547
8.4%
N8547
8.4%
i7741
7.6%
s7741
7.6%
Other values (2)8547
8.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1443974
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0355717
24.6%
9325630
22.6%
,183367
12.7%
$104519
 
7.2%
-86710
 
6.0%
579469
 
5.5%
469930
 
4.8%
747529
 
3.3%
239181
 
2.7%
124429
 
1.7%
Other values (14)127493
 
8.8%

IncomeVerifiable
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size110.5 KiB
True
104479 
False
 
8587
ValueCountFrequency (%)
True104479
92.4%
False8587
 
7.6%
2022-08-25T19:16:48.149049image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

StatedMonthlyIncome
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct13502
Distinct (%)11.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5605.11958
Minimum0
Maximum1750002.917
Zeros1394
Zeros (%)1.2%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:48.427310image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1533
Q13199.395833
median4666.666667
Q36824.6875
95-th percentile12250
Maximum1750002.917
Range1750002.917
Interquartile range (IQR)3625.291667

Descriptive statistics

Standard deviation7495.595563
Coefficient of variation (CV)1.337276655
Kurtosis26784.24094
Mean5605.11958
Median Absolute Deviation (MAD)1750
Skewness125.0987676
Sum633748450.4
Variance56183952.84
MonotonicityNot monotonic
2022-08-25T19:16:48.761421image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4166.6666673486
 
3.1%
50003367
 
3.0%
3333.3333332889
 
2.6%
37502399
 
2.1%
5416.6666672351
 
2.1%
5833.3333332284
 
2.0%
62502255
 
2.0%
25002238
 
2.0%
4583.3333332186
 
1.9%
6666.6666672139
 
1.9%
Other values (13492)87472
77.4%
ValueCountFrequency (%)
01394
1.2%
0.083333251
 
0.2%
0.251
 
< 0.1%
0.8333331
 
< 0.1%
1.4166671
 
< 0.1%
1.6666671
 
< 0.1%
1.8333332
 
< 0.1%
1.9166671
 
< 0.1%
2.0833331
 
< 0.1%
2.1666671
 
< 0.1%
ValueCountFrequency (%)
1750002.9171
< 0.1%
618547.83331
< 0.1%
483333.33331
< 0.1%
466666.66671
< 0.1%
416666.66671
< 0.1%
3944001
< 0.1%
2500001
< 0.1%
208333.33331
< 0.1%
185081.752
< 0.1%
158333.33331
< 0.1%

DebtToIncomeRatio
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct1207
Distinct (%)1.2%
Missing8472
Missing (%)7.5%
Infinite0
Infinite (%)0.0%
Mean0.2760324777
Minimum0
Maximum10.01
Zeros19
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:49.129442image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.06
Q10.14
median0.22
Q30.32
95-th percentile0.51
Maximum10.01
Range10.01
Interquartile range (IQR)0.18

Descriptive statistics

Standard deviation0.5537376038
Coefficient of variation (CV)2.006059607
Kurtosis259.5241037
Mean0.2760324777
Median Absolute Deviation (MAD)0.08
Skewness15.38596561
Sum28871.34097
Variance0.3066253338
MonotonicityNot monotonic
2022-08-25T19:16:49.484499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.184103
 
3.6%
0.223662
 
3.2%
0.173595
 
3.2%
0.143531
 
3.1%
0.23445
 
3.0%
0.163409
 
3.0%
0.193372
 
3.0%
0.153321
 
2.9%
0.213193
 
2.8%
0.133152
 
2.8%
Other values (1197)69811
61.7%
(Missing)8472
 
7.5%
ValueCountFrequency (%)
019
 
< 0.1%
0.000441
 
< 0.1%
0.00311
 
< 0.1%
0.006111
 
< 0.1%
0.006471
 
< 0.1%
0.006771
 
< 0.1%
0.007221
 
< 0.1%
0.01250
0.2%
0.010421
 
< 0.1%
0.010511
 
< 0.1%
ValueCountFrequency (%)
10.01272
0.2%
9.771
 
< 0.1%
9.441
 
< 0.1%
9.21
 
< 0.1%
9.061
 
< 0.1%
8.631
 
< 0.1%
8.531
 
< 0.1%
8.521
 
< 0.1%
8.271
 
< 0.1%
8.131
 
< 0.1%

EmploymentStatus
Categorical

HIGH CORRELATION
MISSING

Distinct8
Distinct (%)< 0.1%
Missing2255
Missing (%)2.0%
Memory size7.0 MiB
Employed
66598 
Full-time
26354 
Self-employed
 
6052
Not available
 
5347
Other
 
3742
Other values (3)
 
2718

Length

Max length13
Median length8
Mean length8.68365054
Min length5

Characters and Unicode

Total characters962244
Distinct characters25
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSelf-employed
2nd rowEmployed
3rd rowNot available
4th rowEmployed
5th rowEmployed

Common Values

ValueCountFrequency (%)
Employed66598
58.9%
Full-time26354
 
23.3%
Self-employed6052
 
5.4%
Not available5347
 
4.7%
Other3742
 
3.3%
Part-time1088
 
1.0%
Not employed835
 
0.7%
Retired795
 
0.7%
(Missing)2255
 
2.0%

Length

2022-08-25T19:16:49.831574image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-08-25T19:16:50.187630image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
employed67433
57.6%
full-time26354
 
22.5%
not6182
 
5.3%
self-employed6052
 
5.2%
available5347
 
4.6%
other3742
 
3.2%
part-time1088
 
0.9%
retired795
 
0.7%

Most occurring characters

ValueCountFrequency (%)
l142939
14.9%
e124545
12.9%
m100927
10.5%
o79667
8.3%
d74280
7.7%
p73485
7.6%
y73485
7.6%
E66598
6.9%
t39249
 
4.1%
i33584
 
3.5%
Other values (15)153485
16.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter811757
84.4%
Uppercase Letter110811
 
11.5%
Dash Punctuation33494
 
3.5%
Space Separator6182
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l142939
17.6%
e124545
15.3%
m100927
12.4%
o79667
9.8%
d74280
9.2%
p73485
9.1%
y73485
9.1%
t39249
 
4.8%
i33584
 
4.1%
u26354
 
3.2%
Other values (6)43242
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
E66598
60.1%
F26354
 
23.8%
N6182
 
5.6%
S6052
 
5.5%
O3742
 
3.4%
P1088
 
1.0%
R795
 
0.7%
Dash Punctuation
ValueCountFrequency (%)
-33494
100.0%
Space Separator
ValueCountFrequency (%)
6182
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin922568
95.9%
Common39676
 
4.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
l142939
15.5%
e124545
13.5%
m100927
10.9%
o79667
8.6%
d74280
8.1%
p73485
8.0%
y73485
8.0%
E66598
7.2%
t39249
 
4.3%
i33584
 
3.6%
Other values (13)113809
12.3%
Common
ValueCountFrequency (%)
-33494
84.4%
6182
 
15.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII962244
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l142939
14.9%
e124545
12.9%
m100927
10.5%
o79667
8.3%
d74280
7.7%
p73485
7.6%
y73485
7.6%
E66598
6.9%
t39249
 
4.1%
i33584
 
3.5%
Other values (15)153485
16.0%

EmploymentStatusDuration
Real number (ℝ≥0)

MISSING
ZEROS

Distinct605
Distinct (%)0.6%
Missing7625
Missing (%)6.7%
Infinite0
Infinite (%)0.0%
Mean96.06058364
Minimum0
Maximum755
Zeros1503
Zeros (%)1.3%
Negative0
Negative (%)0.0%
Memory size883.5 KiB
2022-08-25T19:16:50.572605image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q126
median67
Q3137
95-th percentile297
Maximum755
Range755
Interquartile range (IQR)111

Descriptive statistics

Standard deviation94.43224105
Coefficient of variation (CV)0.9830487956
Kurtosis2.72591678
Mean96.06058364
Median Absolute Deviation (MAD)48
Skewness1.581477048
Sum10128724
Variance8917.448151
MonotonicityNot monotonic
2022-08-25T19:16:50.894748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01503
 
1.3%
41177
 
1.0%
11171
 
1.0%
31166
 
1.0%
51147
 
1.0%
21144
 
1.0%
71102
 
1.0%
81097
 
1.0%
61093
 
1.0%
121071
 
0.9%
Other values (595)93770
82.9%
(Missing)7625
 
6.7%
ValueCountFrequency (%)
01503
1.3%
11171
1.0%
21144
1.0%
31166
1.0%
41177
1.0%
51147
1.0%
61093
1.0%
71102
1.0%
81097
1.0%
91022
0.9%
ValueCountFrequency (%)
7551
< 0.1%
7451
< 0.1%
7331
< 0.1%
7321
< 0.1%
7311
< 0.1%
6901
< 0.1%
6851
< 0.1%
6781
< 0.1%
6721
< 0.1%
6621
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size110.5 KiB
True
57052 
False
56014 
ValueCountFrequency (%)
True57052
50.5%
False56014
49.5%
2022-08-25T19:16:51.649748image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Interactions

2022-08-25T19:16:34.338769image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:11.556720image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:13.500420image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:15.784239image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:18.211178image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:20.485186image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:22.438608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:24.342126image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:26.384390image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:28.808400image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:30.884953image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:34.620020image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:11.749206image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:13.683898image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:16.033571image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:18.399678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:20.681662image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:22.602399image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:24.511710image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:26.556966image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:29.015019image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:31.064475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:34.857387image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:11.934714image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:13.876315image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:16.298902image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:18.595192image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:20.868575image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:22.774910image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:24.686214image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:26.738433image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:29.241155image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:31.301842image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:35.087775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:12.102304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:14.048856image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:16.515291image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:18.797654image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:21.039529image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:22.941463image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:24.868766image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:26.931059image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:29.429654image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:31.532230image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:35.338111image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:12.294756image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:14.249026image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:16.793554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:19.004070image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:21.236233image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:23.127966image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:25.067250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:27.288396image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:29.619151image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:31.758629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:35.568499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:12.461262image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:14.429975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:17.025935image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:19.197591image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:21.415162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:23.296528image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:25.279490image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:27.610505image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:29.802583image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:31.987022image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:35.797889image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:12.630812image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:14.610831image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:17.249340image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:19.399020image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:21.582681image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:23.475262image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:25.480990image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:27.811967image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:29.981073image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:32.209429image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:36.038250image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:12.797366image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:14.788323image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:17.469753image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:19.584234image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:21.758418image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:23.643812image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:25.661251image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:28.014430image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:30.162871image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:32.801858image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:36.271629image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:12.956779image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:14.962985image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:17.664236image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:19.769705image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:21.925010image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:23.808957image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:25.821440image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:28.215212image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:30.349375image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:33.343414image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:36.514986image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:13.143374image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:15.239718image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:17.847749image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:19.962610image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:22.106530image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:23.988444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:25.997843image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:28.413684image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:30.531890image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:33.697474image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:36.751353image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:13.314877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:15.511958image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:18.027306image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:20.288709image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:22.270094image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:24.154080image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:26.182355image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:28.601950image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:30.692462image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-08-25T19:16:34.010640image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-08-25T19:16:51.861179image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-08-25T19:16:52.223216image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-08-25T19:16:52.583258image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-08-25T19:16:52.998156image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-08-25T19:16:37.692848image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-08-25T19:16:39.003365image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-08-25T19:16:40.397658image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-08-25T19:16:40.919271image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexListingKeyListingCreationDateClosedDateLoanStatusTermLoanOriginalAmountMonthlyLoanPaymentListingCategory (numeric)BorrowerAPRBorrowerRateCreditGradeProsperRating (Alpha)CreditScoreRangeLowerCreditScoreRangeUpperIncomeRangeIncomeVerifiableStatedMonthlyIncomeDebtToIncomeRatioEmploymentStatusEmploymentStatusDurationIsBorrowerHomeowner
001021339766868145413AB3B2007-08-26 19:09:29.2630000002009-08-14 00:00:00Completed369425330.4300.165160.1580CNaN640.0659.0$25,000-49,999True3083.3333330.17Self-employed2.0True
1110273602499503308B223C12014-02-27 08:28:07.900000000NaNCurrent3610000318.9320.120160.0920NaNA680.0699.0$50,000-74,999True6125.0000000.18Employed44.0False
220EE9337825851032864889A2007-01-05 15:00:47.0900000002009-12-17 00:00:00Completed363001123.3200.282690.2750HRNaN480.0499.0Not displayedTrue2083.3333330.06Not availableNaNFalse
330EF5356002482715299901A2012-10-22 11:02:35.010000000NaNCurrent3610000321.45160.125280.0974NaNA800.0819.0$25,000-49,999True2875.0000000.15Employed113.0True
440F023589499656230C5E3E22013-09-14 18:38:39.097000000NaNCurrent3615000563.9720.246140.2085NaND680.0699.0$100,000+True9583.3333330.26Employed44.0True
550F05359734824199381F61D2013-12-14 08:26:37.093000000NaNCurrent6015000342.3710.154250.1314NaNB740.0759.0$100,000+True8333.3333330.36Employed82.0True
660F0A3576754255009D631512013-04-12 09:52:56.147000000NaNCurrent363000122.6710.310320.2712NaNE680.0699.0$25,000-49,999True2083.3333330.27Employed172.0False
770F1035772717087366F9EA72013-05-05 06:49:27.493000000NaNCurrent3610000372.6020.239390.2019NaNC700.0719.0$25,000-49,999True3355.7500000.24Employed103.0False
880F043596202561788EA13D52013-12-02 10:43:39.117000000NaNCurrent3610000305.5470.076200.0629NaNAA820.0839.0$25,000-49,999True3333.3333330.25Employed269.0True
9100F123545674891886D9F1062012-05-10 07:04:01.577000000NaNCurrent6013500395.3710.274620.2489NaNC640.0659.0$75,000-99,999True7500.0000000.12Employed300.0False

Last rows

df_indexListingKeyListingCreationDateClosedDateLoanStatusTermLoanOriginalAmountMonthlyLoanPaymentListingCategory (numeric)BorrowerAPRBorrowerRateCreditGradeProsperRating (Alpha)CreditScoreRangeLowerCreditScoreRangeUpperIncomeRangeIncomeVerifiableStatedMonthlyIncomeDebtToIncomeRatioEmploymentStatusEmploymentStatusDurationIsBorrowerHomeowner
113056113927E34334198347358038919762008-04-30 21:25:19.6700000002011-05-09 00:00:00Completed364292132.1140.074690.0679AANaN760.0779.0$100,000+True10333.3333330.06Full-time69.0True
113057113928E34935176664905343E01EA2011-06-06 19:02:44.4430000002011-09-19 00:00:00Completed36200073.3030.223620.1899NaNC740.0759.0$25,000-49,999True2333.3333330.27Full-time22.0False
113058113929E3553583161337791FCB87F2013-07-06 17:40:01.6570000002014-02-07 00:00:00Completed362500101.2520.302850.2639NaNE660.0679.0$50,000-74,999True4333.3333330.05Employed25.0False
113059113930E35D3584034795373BCD69A2013-07-08 10:24:49.700000000NaNCurrent363000106.0510.200530.1639NaNB680.0699.0$75,000-99,999True6250.0000000.20Employed85.0True
113060113931E36F36005339663245C20F82014-01-16 20:13:08.040000000NaNCurrent6025000565.5030.150160.1274NaNB800.0819.0$75,000-99,999True8146.6666670.28Employed12.0False
113061113932E6D9357655724827169606C2013-04-14 05:55:02.663000000NaNCurrent3610000364.7410.223540.1864NaNC700.0719.0$50,000-74,999True4333.3333330.13Employed246.0True
113062113933E6DB353036033497292EE432011-11-03 20:42:55.333000000NaNFinalPaymentInProgress36200065.5770.132200.1110NaNA700.0719.0$75,000-99,999True8041.6666670.11Employed21.0True
113063113934E6E13596170052029692BB12013-12-13 05:49:12.703000000NaNCurrent6010000273.3510.239840.2150NaND700.0719.0$25,000-49,999True2875.0000000.51Employed84.0True
113064113935E6EB3531504622671970D9E2011-11-14 13:18:26.5970000002013-08-13 00:00:00Completed6015000449.5520.284080.2605NaNC680.0699.0$25,000-49,999True3875.0000000.48Full-time94.0True
113065113936E6ED3600409833199F711B72014-01-15 09:27:37.657000000NaNCurrent36200064.9010.131890.1039NaNA680.0699.0$50,000-74,999True4583.3333330.23Employed244.0False